[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

DarkLight1337 · 2024-08-04T15:33:11Z

The calculation of get_max_multimodal_tokens is designed for a single instance of multi-modal data (e.g. image), so it is inconsistent with dummy data when the dummy data contains multiple instances of multi-modal data.

To support the above case, this PR introduces the --limit-mm-per-prompt argument which limits how many instances of multi-modal data are allowed per prompt. During profiling, the total number of multimodal tokens for a given modality can be obtained by multiplying the result of get_max_multimodal_tokens by the corresponding limit.

Checklist

Update MultiModalConfig and CLI args with the new argument
Update the calculation for the total number of multimodal tokens
Enforce the limit during profiling (InputRegistry.dummy_data_for_profiling)
Enforce the limit during inference (MultiModalRegistry.map_input)
Add corresponding tests (except for calculation and profiling)

github-actions · 2024-08-04T15:33:24Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

…ted metadata in the future

ywang96

@DarkLight1337 Left a few comments - PTAL!

vllm/engine/arg_utils.py

ywang96 · 2024-08-14T06:02:34Z

vllm/engine/llm_engine.py

@@ -180,6 +181,7 @@ def __init__(
        log_stats: bool,
        usage_context: UsageContext = UsageContext.ENGINE_CONTEXT,
        stat_loggers: Optional[Dict[str, StatLoggerBase]] = None,
+        input_registry: InputRegistry = INPUT_REGISTRY,


Why do we need to make this an variable of __init__?

Compared to assigning the global INPUT_REGISTRY directly to the instance attribute, this makes it easier to see the dependencies of LLMEngine.

vllm/inputs/registry.py

vllm/engine/arg_utils.py

ywang96 · 2024-08-14T06:17:58Z

vllm/model_executor/model_loader/loader.py

        if multimodal_config is None:
-            raise ValueError("Provide vision related configurations "
+            raise ValueError("Provide multi-modal related configurations "


Now looking at the previous piece of code, is it ever possible that multimodal_config is None? If not, then this should probably be assert multimodal_config is not None?

Yeah, it can't be None now. It's a holdover from the previous implementation of config... we can remove this in a later PR since quite a few files have to be changed.

vllm/multimodal/registry.py

ywang96 · 2024-08-14T06:25:19Z

vllm/worker/enc_dec_model_runner.py

+        input_registry: InputRegistry = INPUT_REGISTRY,
+        mm_registry: MultiModalRegistry = MULTIMODAL_REGISTRY,


Ditto for having these as input variables

vllm/multimodal/registry.py

ywang96

LGTM!

AllenDou · 2024-08-15T05:07:41Z

vllm/worker/model_runner.py

-                .get_max_multimodal_tokens(model_config)
+        input_registry = self.input_registry
+        mm_registry = self.mm_registry
+        mm_registry.init_mm_limits_per_prompt(model_config, mm_config)


How about moving mm_registry.init_mm_limits_per_prompt into the model runner's __init__ phase? As some model runners don't have a profiling run phase, as well as enc_dec_model_runner and xpu_model_runner

I think that's a good point - I assume this is regarding generating embeddings from a LMM? WDYT? @DarkLight1337

Yeah, it should be fine to move it to __init__. Can you also implement this in #7530?

Perhaps we need to factor out the profiling + input mapping logic into its own class. (so that _limits_by_model is kept track somewhere close to the model runner instead of inside MultiModalRegistry itself)

Yea I'm doing it in #7530 @AllenDou @DarkLight1337

…ompt (vllm-project#7126)

xyfZzz · 2024-08-19T08:57:03Z

Hi~ Does vllm support multiple image input now?

ywang96 · 2024-08-19T08:58:40Z

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

…ompt (vllm-project#7126)

xyfZzz · 2024-09-08T10:02:29Z

Hi~ Does vllm support multiple image input now?

@xyfZzz Not yet - This PR itself allows profiling with multiple image input but there are still a few things we need to do to enable multi-image input for inference. Stay tuned!

Thanks! Since another three weeks have passed, I would like to ask if vllm now supports multiple image inputs?

DarkLight1337 · 2024-09-08T10:07:53Z

Yes, it's supported now. Please check out the docs.

xyfZzz · 2024-09-09T05:14:41Z

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Could you please help me find out why this error occurred?

xyfZzz · 2024-09-09T05:37:58Z

Yes, it's supported now. Please check out the docs.

@DarkLight1337 Hi~ I installed the latest main branch of vllm and deployed MiniCPM-V-2.6, but this error occurred when calling the openai style interface.
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'At most 1 image(s) may be provided in one request.', 'type': 'BadRequestError', 'param': None, 'code': 400}
Could you please help me find out why this error occurred?

I found the cause of the error. I should set --limit-mm-per-prompt image=2 when deploying.

…ompt (vllm-project#7126) Signed-off-by: Alvant <alvasian@yandex.ru>

DarkLight1337 added 2 commits August 4, 2024 15:19

Handle multiple inputs per prompt for multi-modal profilng

cf15a3a

Add arg parsing test

a432cc0

DarkLight1337 requested a review from ywang96 August 4, 2024 15:33

DarkLight1337 changed the title ~~[VLM] Support profiling for multiple multi-modal inputs per prompt~~ [VLM][Core] Support profiling for multiple multi-modal inputs per prompt Aug 4, 2024

DarkLight1337 mentioned this pull request Aug 4, 2024

[RFC]: Multi-modality Support Refactoring #4194

Open

69 tasks

format

7af6557

DarkLight1337 changed the title ~~[VLM][Core] Support profiling for multiple multi-modal inputs per prompt~~ [VLM][Core] Support profiling with multiple multi-modal inputs per prompt Aug 4, 2024

DarkLight1337 added 22 commits August 5, 2024 02:10

Merge branch 'upstream' into multi-mm-profiling

782c10b

Enforce the limit during inference

3d8cf91

Fix message

eaa8375

Fix whitespace

c1808d6

Enforce the limit during profiling

a2a61dd

Pass the global registry externally

d2e9986

Apply check before mapping input

91367ce

Improve error format

22fdfdc

Avoid unnecessary batching for now

df04173

Merge branch 'upstream' into multi-mm-profiling

24ab595

Fix key error for non-mm models

a0e5737

grammar

927600e

format

ab707e2

Make argument format more consistent

4e9c308

Improve error handling

2f6d311

Contain the inner multimodal data type in case we want to add associa…

3fb5e6b

…ted metadata in the future

Fix key error

c78fa91

Fix key error in mm tests

50d4431

Avoid direct use of globals

414b48e

Avoid global usage in tests

2911a1a

Test the behaviour of multi-input w/ limits

e19b13d

Remove one line

73e85c0

Fix merge

6f20cb7

ywang96 self-assigned this Aug 13, 2024

DarkLight1337 mentioned this pull request Aug 13, 2024

[Frontend][Core] Add plumbing to support audio language models #7446

Merged

DarkLight1337 added 2 commits August 14, 2024 00:48

Merge branch 'upstream' into multi-mm-profiling

33202ce

Update audio data spec

59015d4

ywang96 reviewed Aug 14, 2024

View reviewed changes

DarkLight1337 added 4 commits August 14, 2024 06:34

Apply suggestions

d56ca21

Add example to CLI arg

ea6cbcf

Merge branch 'upstream' into multi-mm-profiling

8b94ce1

format

fa68319

ywang96 reviewed Aug 14, 2024

View reviewed changes

vllm/multimodal/registry.py Outdated Show resolved Hide resolved

DarkLight1337 added 3 commits August 14, 2024 16:14

Merge branch 'upstream' into multi-mm-profiling

aad8bbf

Add comment and rename

4a74930

format

6761ed8

ywang96 approved these changes Aug 14, 2024

View reviewed changes

ywang96 enabled auto-merge (squash) August 14, 2024 16:36

ywang96 merged commit 3f674a4 into vllm-project:main Aug 14, 2024
52 checks passed

ywang96 mentioned this pull request Aug 14, 2024

[VLM] Refactor MultiModalConfig initialization and profiling #7530

Merged

DarkLight1337 deleted the multi-mm-profiling branch August 14, 2024 23:39

DarkLight1337 mentioned this pull request Aug 14, 2024

[Model] Add multi-image input support for LLaVA-Next offline inference #7230

Merged

4 tasks

AllenDou reviewed Aug 15, 2024

View reviewed changes

kylesayrs pushed a commit to neuralmagic/vllm that referenced this pull request Aug 17, 2024

[VLM][Core] Support profiling with multiple multi-modal inputs per pr…

b1b97d5

…ompt (vllm-project#7126)

fialhocoelho pushed a commit to opendatahub-io/vllm that referenced this pull request Aug 22, 2024

[VLM][Core] Support profiling with multiple multi-modal inputs per pr…

cfe9c18

…ompt (vllm-project#7126)

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

[VLM][Core] Support profiling with multiple multi-modal inputs per pr…

21f6d7d

…ompt (vllm-project#7126) Signed-off-by: Alvant <alvasian@yandex.ru>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

DarkLight1337 commented Aug 4, 2024 •

edited

Loading

github-actions bot commented Aug 4, 2024

ywang96 left a comment •

edited

Loading

ywang96 Aug 14, 2024

DarkLight1337 Aug 14, 2024

ywang96 Aug 14, 2024

DarkLight1337 Aug 14, 2024 •

edited

Loading

ywang96 Aug 14, 2024

ywang96 left a comment

AllenDou Aug 15, 2024 •

edited

Loading

ywang96 Aug 15, 2024

DarkLight1337 Aug 15, 2024

DarkLight1337 Aug 15, 2024 •

edited

Loading

ywang96 Aug 15, 2024

xyfZzz commented Aug 19, 2024

ywang96 commented Aug 19, 2024 •

edited

Loading

xyfZzz commented Sep 8, 2024

DarkLight1337 commented Sep 8, 2024

xyfZzz commented Sep 9, 2024

xyfZzz commented Sep 9, 2024

		input_registry: InputRegistry = INPUT_REGISTRY,
		mm_registry: MultiModalRegistry = MULTIMODAL_REGISTRY,

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

[VLM][Core] Support profiling with multiple multi-modal inputs per prompt #7126

Conversation

DarkLight1337 commented Aug 4, 2024 • edited Loading

Checklist

github-actions bot commented Aug 4, 2024

ywang96 left a comment • edited Loading

Choose a reason for hiding this comment

ywang96 Aug 14, 2024

Choose a reason for hiding this comment

DarkLight1337 Aug 14, 2024

Choose a reason for hiding this comment

ywang96 Aug 14, 2024

Choose a reason for hiding this comment

DarkLight1337 Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

ywang96 Aug 14, 2024

Choose a reason for hiding this comment

ywang96 left a comment

Choose a reason for hiding this comment

AllenDou Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

ywang96 Aug 15, 2024

Choose a reason for hiding this comment

DarkLight1337 Aug 15, 2024

Choose a reason for hiding this comment

DarkLight1337 Aug 15, 2024 • edited Loading

Choose a reason for hiding this comment

ywang96 Aug 15, 2024

Choose a reason for hiding this comment

xyfZzz commented Aug 19, 2024

ywang96 commented Aug 19, 2024 • edited Loading

xyfZzz commented Sep 8, 2024

DarkLight1337 commented Sep 8, 2024

xyfZzz commented Sep 9, 2024

xyfZzz commented Sep 9, 2024

DarkLight1337 commented Aug 4, 2024 •

edited

Loading

ywang96 left a comment •

edited

Loading

DarkLight1337 Aug 14, 2024 •

edited

Loading

AllenDou Aug 15, 2024 •

edited

Loading

DarkLight1337 Aug 15, 2024 •

edited

Loading

ywang96 commented Aug 19, 2024 •

edited

Loading